232 research outputs found

    SAMStat: monitoring biases in next generation sequencing data

    Get PDF
    Motivation: The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such biases can have adverse effects on both mapping rate and accuracy. To understand the relationship between potential protocol biases and poor mapping we wrote SAMstat, a simple C program plotting nucleotide overrepresentation and other statistics in mapped and unmapped reads in a concise html page. Collecting such statistics also makes it easy to highlight problems in the data processing and enables non-experts to track data quality over time

    Discovery and Analysis of Evolutionarily Conserved Intronic Splicing Regulatory Elements

    Get PDF
    Knowledge of the functional cis-regulatory elements that regulate constitutive and alternative pre-mRNA splicing is fundamental for biology and medicine. Here we undertook a genome-wide comparative genomics approach using available mammalian genomes to identify conserved intronic splicing regulatory elements (ISREs). Our approach yielded 314 ISREs, and insertions of ~70 ISREs between competing splice sites demonstrated that 84% of ISREs altered 5′ and 94% altered 3′ splice site choice in human cells. Consistent with our experiments, comparisons of ISREs to known splicing regulatory elements revealed that 40%–45% of ISREs might have dual roles as exonic splicing silencers. Supporting a role for ISREs in alternative splicing, we found that 30%–50% of ISREs were enriched near alternatively spliced (AS) exons, and included almost all known binding sites of tissue-specific alternative splicing factors. Further, we observed that genes harboring ISRE-proximal exons have biases for tissue expression and molecular functions that are ISRE-specific. Finally, we discovered that for Nova1, neuronal PTB, hnRNP C, and FOX1, the most frequently occurring ISRE proximal to an alternative conserved exon in the splicing factor strongly resembled its own known RNA binding site, suggesting a novel application of ISRE density and the propensity for splicing factors to auto-regulate to associate RNA binding sites to splicing factors. Our results demonstrate that ISREs are crucial building blocks in understanding general and tissue-specific AS regulation and the biological pathways and functions regulated by these AS events

    The RIKEN integrated database of mammals

    Get PDF
    The RIKEN integrated database of mammals (http://scinets.org/db/mammal) is the official undertaking to integrate its mammalian databases produced from multiple large-scale programs that have been promoted by the institute. The database integrates not only RIKEN’s original databases, such as FANTOM, the ENU mutagenesis program, the RIKEN Cerebellar Development Transcriptome Database and the Bioresource Database, but also imported data from public databases, such as Ensembl, MGI and biomedical ontologies. Our integrated database has been implemented on the infrastructure of publication medium for databases, termed SciNetS/SciNeS, or the Scientists’ Networking System, where the data and metadata are structured as a semantic web and are downloadable in various standardized formats. The top-level ontology-based implementation of mammal-related data directly integrates the representative knowledge and individual data records in existing databases to ensure advanced cross-database searches and reduced unevenness of the data management operations. Through the development of this database, we propose a novel methodology for the development of standardized comprehensive management of heterogeneous data sets in multiple databases to improve the sustainability, accessibility, utility and publicity of the data of biomedical information

    Distinct roles of the RasGAP family proteins in C. elegans associative learning and memory

    Get PDF
    The Ras GTPase activating proteins (RasGAPs) are regulators of the conserved Ras/MAPK pathway. Various roles of some of the RasGAPs in learning and memory have been reported in different model systems, yet, there is no comprehensive study to characterize all gap genes in any organism. Here, using reverse genetics and neurobehavioural tests, we studied the role of all known genes of the rasgap family in C. elegans in associative learning and memory. We demonstrated that their proteins are implicated in different parts of the learning and memory processes. We show that gap-1 contribute redundantly with gap-3 to the chemosensation of volatile compounds, gap-1 plays a major role in associative learning, while gap-2 and gap-3 are predominantly required for short- and long-term associative memory. Our results also suggest that the C. elegans Ras orthologue let-60 is involved in multiple processes during learning and memory. Thus, we show that the different classes of RasGAP proteins are all involved in cognitive function and their complex interplay ensures the proper formation and storage of novel information in C. elegans

    A new approach for measuring the muon anomalous magnetic moment and electric dipole moment

    Get PDF
    This paper introduces a new approach to measure the muon magnetic moment anomaly a?? = (g - 2)/2 and the muon electric dipole moment (EDM) d?? at the J-PARC muon facility. The goal of our experiment is to measure a?? and d?? using an independent method with a factor of 10 lower muon momentum, and a factor of 20 smaller diameter storage-ring solenoid compared with previous and ongoing muon g - 2 experiments with unprecedented quality of the storage magnetic field. Additional significant differences from the present experimental method include a factor of 1000 smaller transverse emittance of the muon beam (reaccelerated thermal muon beam), its efficient vertical injection into the solenoid, and tracking each decay positron from muon decay to obtain its momentum vector. The precision goal for a?? is a statistical uncertainty of 450 parts per billion (ppb), similar to the present experimental uncertainty, and a systematic uncertainty less than 70 ppb. The goal for EDM is a sensitivity of 1.5 ?? 10-21 ecm

    Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation

    Get PDF
    The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5′-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP–chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community (http://fantom.gsc.riken.jp/4/). Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release

    Large-scale clustering of CAGE tag expression data

    Get PDF
    Background: Recent analyses have suggested that many genes possess multiple transcription start sites (TSSs) that are differentially utilized in different tissues and cell lines. We have identified a huge number of TSSs mapped onto the mouse genome using the cap analysis of gene expression (CAGE) method. The standard hierarchical clustering algorithm, which gives us easily understandable graphical tree images, has difficulties in processing such huge amounts of TSS data and a better method to calculate and display the results is needed. Results: We use a combination of hierarchical and non-hierarchical clustering to cluster expression profiles of TSSs based on a large amount of CAGE data to profit from the best of both methods. We processed the genome-wide expression data, including 159,075 TSSs derived from 127 RNA samples of various organs of mouse, and succeeded in categorizing them into 70-100 clusters. The clusters exhibited intriguing biological features: a cluster supergroup with a ubiquitous expression profile, tissue-specific patterns, a distinct distribution of non-coding RNA and functional TSS groups. Conclusion: Our approach succeeded in greatly reducing the calculation cost, and is an appropriate solution for analyzing large-scale TSS usage data

    Identification and Characterization of Full-Length cDNAs in Channel Catfish (Ictalurus punctatus) and Blue Catfish (Ictalurus furcatus)

    Get PDF
    Background: Genome annotation projects, gene functional studies, and phylogenetic analyses for a given organism all greatly benefit from access to a validated full-length cDNA resource. While increasingly common in model species, fulllength cDNA resources in aquaculture species are scarce. Methodology and Principal Findings: Through in silico analysis of catfish (Ictalurus spp.) ESTs, a total of 10,037 channel catfish and 7,382 blue catfish cDNA clones were identified as potentially encoding full-length cDNAs. Of this set, a total of 1,169 channel catfish and 933 blue catfish full-length cDNA clones were selected for re-sequencing to provide additional coverage and ensure sequence accuracy. A total of 1,745 unique gene transcripts were identified from the full-length cDNA set, including 1,064 gene transcripts from channel catfish and 681gene transcripts from blue catfish, with 416 transcripts shared between the two closely related species. Full-length sequence characteristics (ortholog conservation, UTR length, Kozak sequence, and conserved motifs) of the channel and blue catfish were examined in detail. Comparison of gene ontology composition between full-length cDNAs and all catfish ESTs revealed that the full-length cDNA set is representative of the gene diversity encoded in the catfish transcriptome. Conclusions: This study describes the first catfish full-length cDNA set constructed from several cDNA libraries. The catfish full-length cDNA sequences, and data gleaned from sequence characteristics analysis, will be a valuable resource fo

    Systematic analysis of transcription start sites in avian development

    Get PDF
    © 2017 Lizio et al. Cap Analysis of Gene Expression (CAGE) in combination with single-molecule sequencing technology allows precision mapping of transcription start sites (TSSs) and genome-wide capture of promoter activities in differentiated and steady state cell populations. Much less is known about whether TSS profiling can characterize diverse and non-steady state cell populations, such as the approximately 400 transitory and heterogeneous cell types that arise during ontogeny of vertebrate animals. To gain such insight, we used the chick model and performed CAGE-based TSS analysis on embryonic samples covering the full 3-week developmental period. In total, 31,863 robust TSS peaks ( > 1 tag per million [TPM]) were mapped to the latest chicken genome assembly, of which 34% to 46% were active in any given developmental stage. ZENBU, a web-based, open-source platform, was used for interactive data exploration. TSSs of genes critical for lineage differentiation could be precisely mapped and their activities tracked throughout development, suggesting that non-steady state and heterogeneous cell populations are amenable to CAGE-based transcriptional analysis. Our study also uncovered a large set of extremely stable housekeeping TSSs and many novel stage-specific ones. We furthermore demonstrated that TSS mapping could expedite motif-based promoter analysis for regulatory modules associated with stage-specific and housekeeping genes. Finally, using Brachyury as an example, we provide evidence that precise TSS mapping in combination with Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-on technology enables us, for the first time, to efficiently target endogenous avian genes for transcriptional activation. Taken together, our results represent the first report of genome-wide TSS mapping in birds and the first systematic developmental TSS analysis in any amniote species (birds and mammals). By facilitating promoter-based molecular analysis and genetic manipulation, our work also underscores the value of avian models in unravelling the complex regulatory mechanism of cell lineage specification during amniote development

    One-Step Detection of the 2009 Pandemic Influenza A(H1N1) Virus by the RT-SmartAmp Assay and Its Clinical Validation

    Get PDF
    <div><h3>Background</h3><p>In 2009, a pandemic (pdm) influenza A(H1N1) virus infection quickly circulated globally resulting in about 18,000 deaths around the world. In Japan, infected patients accounted for 16% of the total population. The possibility of human-to-human transmission of highly pathogenic novel influenza viruses is becoming a fear for human health and society.</p> <h3>Methodology</h3><p>To address the clinical need for rapid diagnosis, we have developed a new method, the “RT-SmartAmp assay”, to rapidly detect the 2009 pandemic influenza A(H1N1) virus from patient swab samples. The RT-SmartAmp assay comprises both reverse transcriptase (RT) and isothermal DNA amplification reactions in one step, where RNA extraction and PCR reaction are not required. We used an exciton-controlled hybridization-sensitive fluorescent primer to specifically detect the HA segment of the 2009 pdm influenza A(H1N1) virus within 40 minutes without cross-reacting with the seasonal A(H1N1), A(H3N2), or B-type (Victoria) viruses.</p> <h3>Results and Conclusions</h3><p>We evaluated the RT-SmartAmp method in clinical research carried out in Japan during a pandemic period of October 2009 to January 2010. A total of 255 swab samples were collected from outpatients with influenza-like illness at three hospitals and eleven clinics located in the Tokyo and Chiba areas in Japan. The 2009 pdm influenza A(H1N1) virus was detected by the RT-SmartAmp assay, and the detection results were subsequently compared with data of current influenza diagnostic tests (lateral flow immuno-chromatographic tests) and viral genome sequence analysis. In conclusion, by the RT-SmartAmp assay we could detect the 2009 pdm influenza A(H1N1) virus in patients' swab samples even in early stages after the initial onset of influenza symptoms. Thus, the RT-SmartAmp assay is considered to provide a simple and practical tool to rapidly detect the 2009 pdm influenza A(H1N1) virus.</p> </div
    corecore